31 research outputs found
Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track
The ICML Expressive Vocalizations (ExVo) Multi-task challenge 2022, focuses
on understanding the emotional facets of the non-linguistic vocalizations
(vocal bursts (VB)). The objective of this challenge is to predict emotional
intensities for VB, being a multi-task challenge it also requires to predict
speakers' age and native-country. For this challenge we study and compare two
distinct embedding spaces namely, self-supervised learning (SSL) based
embeddings and task-specific supervised learning based embeddings. Towards
that, we investigate feature representations obtained from several pre-trained
SSL neural networks and task-specific supervised classification neural
networks. Our studies show that the best performance is obtained with a hybrid
approach, where predictions derived via both SSL and task-specific supervised
learning are used. Our best system on test-set surpasses the ComPARE baseline
(harmonic mean of all sub-task scores i.e., ) by a relative
margin